TMX Markup: A Challenge When Adapting SMT to the Localisation Environment
نویسندگان
چکیده
Translation memory (TM) plays an important role in localisation workflow and is used as an efficient and fundamental tool to carry out translation. In recent years, statistical machine translation (SMT) techniques have been rapidly developed, and the translation quality and speed have been significantly improved as well. However, when applying SMT technique to facilitate post-editing in the localisation industry, we need to adapt SMT to the TM data which is formatted with special mark-up. In this paper, we explored some issues of adapting SMT to Symantec formatted TM data. Three different methods are proposed to handle the Translation Memory eXchange (TMX) markup and a comparative study is carried out between them. Furthermore, we also compared the TMX-based SMT systems with a customised SYSTRAN system through human evaluation and automatic evaluation metrics. The experimental results conducted on the French and English language pair show that the SMT can perform well using TMX as input format either during training or at runtime.
منابع مشابه
Seeding Statistical Machine Translation with Translation Memory Output through Tree-Based Structural Alignment
With the steadily increasing demand for high-quality translation, the localisation industry is constantly searching for technologies that would increase translator throughput, with the current focus on the use of high-quality Statistical Machine Translation (SMT) as a supplement to the established Translation Memory (TM) technology. In this paper we present a novel modular approach that utilise...
متن کاملA Template-Based Markup Tool for Semantic Web Content
The Intelligence Community, among others, is increasingly using document metadata to improve document search and discovery on intranets and extranets. Document markup is still often incomplete, inconsistent, incorrect, and limited to keywords via HTML and XML tags. OWL promises to bring semantics to this markup to improve its machine understandability. A usable markup tool is becoming a barrier...
متن کاملDomain Adaptation for Social Localisation-based SMT: A Case Study Using the Trommons Platform
Social localisation is a kind of community action, which matches communities and the content they need, and supports their localisation efforts. The goal of social localisation-based statistical machine translation (SL-SMT) is to support and bridge global communities exchanging any type of digital content across different languages and cultures. Trommons is an open platform maintained by The Ro...
متن کاملEnhancing Statistical Machine Translation with Bilingual Terminology in a CAT Environment
In this paper, we address the problem of extracting and integrating bilingual terminology into a Statistical Machine Translation (SMT) system for a Computer Aided Translation (CAT) tool scenario. We develop a framework that, taking as input a small amount of parallel in-domain data, gathers domain-specific bilingual terms and injects them in an SMT system to enhance the translation productivity...
متن کاملSMT-CAT integration in a Technical Domain: Handling XML Markup Using Pre & Post-processing Methods
The increasing use of eXtensible Markup Language (XML) is bringing additional challenges to statistical machine translation (SMT) and computer assisted translation (CAT) workflow integration in the translation industry. This paper analyzes the need to handle XML markup as a part of the translation material in a technical domain. It explores different ways of handling such markup by applying tra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010